Summarization of Spoken Language — Challenges, Methods, and Prospects
نویسنده
چکیده
While the field of summarizing written texts has been explored for many decades, gaining significantly increased attention in the last five to ten years, summarization of spoken language is a comparatively recent research area. As the amount of spoken audio databases is growing rapidly, however, we predict that the need for high quality summarization of information contained in this medium will rise substantially. Summarization of spoken language may also aid the archiving, indexing, and retrieval of various records of oral communication, such as corporate meetings, sales interactions, or customer support. The purpose of this paper is to place summarization of spoken language in the context of general summarization research, describe its main challenges which are added on top of the already challenging area of written text summarization, describe past and current approaches and systems, and finally provide a tentative outlook on future directions in research and development of spoken language summarization systems.
منابع مشابه
Positional language modeling for extractive broadcast news speech summarization
Extractive summarization, with the intention of automatically selecting a set of representative sentences from a text (or spoken) document so as to concisely express the most important theme of the document, has been an active area of experimentation and development. A recent trend of research is to employ the language modeling (LM) approach for important sentence selection, which has proven to...
متن کاملProblems and Prospects in Collection of Spoken Language Data
In this paper, we focus on the information in speech data and discuss the research issues involved in collecting, organizing, indexing, retrieving and summarization of speech data. We share our experience about the problems and prospects in collection of spoken language data. We highlight some of the procedures, standards that need to be adapted in collecting the speech data, and discuss our pl...
متن کاملEnhanced language modeling for extractive speech summarization with sentence relatedness information
Extractive summarization is intended to automatically select a set of representative sentences from a text or spoken document that can concisely express the most important topics of the document. Language modeling (LM) has been proven to be a promising framework for performing extractive summarization in an unsupervised manner. However, there remain two fundamental challenges facing existing LM...
متن کاملTranscribing human-directed speech for spoken language processing
As storage costs drop and bandwidth increases, there has been a rapid growth of spoken information available via the web or in online archives, raising problems of document retrieval, information extraction, summarization and translation for spoken language. While there is a long tradition of research in these technologies for text, new challenges arise when moving from written to spoken langua...
متن کاملExtractive Spoken Document Summarization with Representation Learning Techniques
The rapidly increasing availability of multimedia associated with spoken documents on the Internet has prompted automatic spoken document summarization to be an important research subject. Thus far, the majority of existing work has focused on extractive spoken document summarization, which selects salient sentences from an original spoken document according to a target summarization ratio and ...
متن کامل